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Abstract —A new upper bound on the relative entropy is de¬ 
rived as a function of the total variation distance for probability 
measures defined on a common finite alphabet. The bound 
Improves a previously reported bound by Csiszar and Talata. It 
is further extended to an upper bound on the Renyi divergence 
of an arbitrary non-negative order (including oo) as a function 
of the total variation distance. 
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1. Introduction 


Consider two probability distributions P and Q defined 
on a common measurable space The Csiszar- 

Kemperman-Kullback-Pinsker inequality (a.k.a. Pinsker’s in¬ 
equality) states that 


i|P-Qp loge<D(P||Q) ( 1 ) 


where 


D{P\\Q)=Ep 


1 




designates the relative entropy (a.k.a. the Kullback-Leibler 
divergence) from P to Q, and 


\P - Q\ = 2 sup \P{P) - Q{P)\ (3) 


is the total variation distance between P and Q. 

A “reverse Pinsker inequality” providing an upper bound 
on the relative entropy in terms of the total variation distance 
does not exist in general since we can find distributions that 
are arbitrarily close in total variation but with arbitrarily 
high relative entropy. Nevertheless, it is possible to introduce 
constraints under which such reverse Pinsker inequalities can 
be obtained. In the case where the probability measures P 
and Q are dehned on a common discrete (i.e., finite or 
countable) set A, 


DiP\\Q) = Y,Pia) log 


Pja) 

Q{ay 


\P-Q\^ Y,\P{a)-Q{a) 

oGA 


(4) 

(5) 


The total variation distance is bounded \P — Q\ < 2, whereas 
the relative entropy is an unbounded information measure. 

Improved versions of Pinsker’s inequality were studied, 
e.g., in 0, Co), m, ini, ED. 

A “reverse Pinsker inequality” providing an upper bound 
on the relative entropy in terms of the total variation distance 
does not exist in general since we can find distributions that 
are arbitrarily close in total variation but with arbitrarily 
high relative entropy. Nevertheless, it is possible to introduce 
constraints under which such reverse Pinsker inequalities can 
be obtained. In the case of a finite alphabet A, Csiszar and 
Talata ||6] p. 1012] show that 

B(PIIQ) < (^) ■ jP - Qiy (6) 

\ min / 


where 


Qmin = min ( 5 (a). 
aGA 


(7) 


Recent applications of can be found in m Ap¬ 
pendix D] and ll^ Lemma 7] for the analysis of the third- 
order asymptotics of the discrete memoryless channel with 
or without cost constraints. 

In addition to (5min in 0, the bounds in this paper involve 


/3i 


. Q{a) 

mm , , 
aGA P{a) 

. P{a) 

min 

Q>GA (^yCLj 


( 8 ) 

(9) 


so, Pi, 132 € [0,1]. 

In this paper, Section|^derives a reverse Pinsker inequality 
for probability measures defined on a common finite set, 
improving the bound in (|^. The utility of this inequality 
is studied in Section and it is extended in Section to 
Renyi divergences of an arbitrary non-negative order. 

2. A New Reverse Pinsker Inequality for 
Distributions on a Finite Set 


One of the implications of (0 is that convergence in rela¬ 
tive entropy implies convergence in total variation distance. 


The present section introduces a strengthened version of 
0, followed by some remarks and an example. 










A. Main Result and Proof 

Theorem 1. Let P and Q be probability measures defined on 
a common finite set A, and assume that Q is strictly positive 
on A. Then, the following inequality holds: 

\P-Q\^\ /32loge 


f^(P|lQ)<log 1 + 


< log 1 + 


2Q 

min 

\P-Q\^ 

2Q 

min 


\P-Q\^ 

( 10 ) 

( 11 ) 


where Qmin and P 2 are given in 0 and (|^, respectively. 

Proof: Theorem [T] is proved by obtaining upper and 
lower bounds on the ;p^-divergence from P to Q 


xHp\\Q) = Y1 

a^A 


(P(a)-Q(a))^ 

Q{a) 


( 12 ) 


A lower bound follows by invoking Jensen’s inequality 


= exp(P(P||Q))-1. (16) 

Alternatively, ( [Tfil l can be obtained by combining the equality 
x'(P||g)=exp(P2(P|lQ))-l (17) 


with the monotonicity of the Renyi divergence Da{P\\Q) in 
a, which implies that P 2 (P||g) > -D(P||g). 

A refined version of ( [T6| ) is derived in the following. The 
starting point is a refined version of Jensen’s inequality in 
II 20 I Lemma 1], generalizing a result from Q Theorem 1]), 


which leads to (see ll^ Theorem 7]) 

min • D{Q\\P) 
aGyt g(a) 

<log(l + x'(P||g))-i7(P||Q) (18) 

< max^^ • P(Q||P). (19) 

Ci^A (a^yCLj 

From ( [T^ and the definition of /32 in 0, we have 

x\P\\Q) 

> exp(p(P||Q) + ^2 77(g||P)) - 1 ( 20 ) 

>exp(^P(P||g) + ^^^.|P-Q|2^ -1 (21) 


where ( |20| ) follows from ( [T8] l and the definition of P 2 in 0, 
and 0 follows from Pinsker’s inequality 0. Note that the 
lower bound in 0 refines the lower bound in 0 since 
/32e [0,1]. 


An upper bound on x^(7^l|g) 1^ derived as follows: 


x\P\\Q) = E 

a^A 


< 


{P{a)-Q{a)f 
Q{a) 




|7^-gi 

gmin 


max |P(a) — Q{a)\ 
aeA 


and, from 0 
Combining (|2^ and (|24|) yields 


P — Q\P 2max |P(a) — Q(a)\. 
aeA 


xHpWQ) < 


\P-Q\^^ 

2Q 

min 


( 22 ) 

(23) 

(24) 

(25) 


Finally, 0 follows by combining the upper and lower 
bounds on the x^-divergence in 0 and ( |25l l. ■ 

Remark 1. It is easy to check that Theorem [T] strengthens 
the bound by Csiszar and Talata in 0 by at least a factor 
of 2 since upper bounding the logarithm in ( [Tol l gives 

(1 - ^2 gmin)l0ge 


P(P|1Q) < 


2Qn 


\P-Q\- (26) 


In the finite-alphabet case, we can obtain another upper 
bound on P(P||g) as a function of the £2 norm jjP — g|j 2 : 

p(p|lQ) < log (1 + . IIP_ Qg 

(27) 

which follows by combining ( |2T] i, ( |2^ , and ||P — g ||2 < 
|P — g|. Using the inequality log(l + a;) < a: log e for a: > 0 
in the right side of ( |27l i, and also loosening this bound by 
ignoring the term • jjP — gjH, we recover the bound 

|P- Qlli loge 


P(P||Q) < 


Qn 


(28) 


which appears in the proof of Property 4 of ETl Lemma 7], 
and also used in |[T2] (174)]. 

Remark 2. The lower bounds on the x^-divergence in ( [T6] l 
and ( [2l] i improve the one in ||6l Lemma 6.3] which states 
that P(P||g) < x^(^’l|g)loge. 


Remark 3. Reverse Pinsker inequalities have been also 
derived in quantum information theory (d, El), providing 
upper bounds on the relative entropy of two quantum states 
as a function of the trace norm distance when the minimal 
eigenvalues of the states are positive (c.f. d Theorem 6] 
and d Theorem 1]). These type of bounds are akin to 
the weakend form in ( [TT| l. When the variational distance is 
much smaller than the minimal eigenvalue (see m Eq. (57)]), 
the latter bounds have a quadratic scaling in this distance, 
similarly to 0; they are also inversely proportional to the 
minimal eigenvalue, similarly to the dependence of 0 in 

^rnin- 





















3. Applications OF Theorem [T] 

A. The Exponential Decay of the Probability for a Non- 
Typical Sequence 

To exemplify the utility of Theorem we bound the 
function 


Ls{Q)= min D{P\\Q) (29) 

PiTslQ) 

where we have denoted the subset of probability measures 
on (A, which are J-close to Q as 

Ts{Q) = [P:'ia€A, \P{a) - Q{a)\ < SQ{a)^ (30) 

Note that (oi,... ,a„) is strongly (5-typical according to Q 
if its empirical distribution belongs to Ts{Q)- According to 
Sanov’s theorem (e.g. ||5] Theorem 11.4.1]), if the random 
variables are independent distributed according to Q, then 
the probability that (Yi,...,y„), is not (5-typical vanishes 
exponentially with exponent Ls{Q). 

To state the next result, we invoke the following notions 
from m. Given a probability measure Q, its balance 
coefficient is given by 


Pq = inf Q{A). 

Ae^-. Q{A)>^ 

The function f: (0, —>■ loge, oo) is given by 


m = { 

l 2 loge, 

Theorem 2. If Qmin > 0, then 

->2 r 2 




P = 


2 • 


</>(!- /3q) Qrnin < Ls{Q) 

< log (l -f 2Qmi„ 6^) 
where ([^1 holds if ^ - 1. 


(31) 


(32) 


(33) 

(34) 


Proof: Ordentlich and Weinberger lfT4] Section 4] show 
the refinement of Pinsker’s inequality; 

f{l-PQ)\P-Q\^<D{P\\Q). (35) 

Note that if Qmin > 0 then /3 q < 1 —Qmin < 1, and therefore 
— Pq) is well defined and finite. If P ^ Ts{Q) the simple 
bound 

|P-Q|><5Qmi„ (36) 

together with ( |T5] l yields ( [3^ . 

The upper bound ( [3^ follows from o and the fact that 
if < Q~ln - 1. then 

min \P - Q\ = 2SQ^in- (37) 

P^TsiQ)' 

■ 

If (5 < — 1, the ratio between the upper and lower 

bounds in ( [34l i, satisfies 

_ loge log(l -f2Q„,in(5^) ^ 4 

Qmin 2(^(1-,0 q) 2 log e Qmin <5^ “Qmin 

where ( |38| l follows from the fact that its second and third 
factors are less than or equal to 1 and 4, respectively. Note 
that the bounds in ( [33| and p4| ) scale like 6^ for 5 « 0. 


B. Distance from Equiprobable 

If P is a distribution on a finite set A, H{P) gauges the 
“distance” from U, the equiprobable distribution, since 

P(P) = log|^|-P(P|lP). (39) 

Thus, it is of interest to explore the relationship between 

H{P) and |P — U\. Particularizing ([^l, lIH (2.2)] (see also 
112?] pp. 30-31]), and o we obtain 

\P-U\< (logl^l-P(P)), (40) 

<2y^l-^-exp(P(P)), (41) 

\P-U\> ^2 (^exp(-P(P))-^), (42) 

respectively. 



Eig. 1. Bounds on \P — I7| as a function of H{P) for |A| = 4, and 
|A| = 16. The point {H{P), \P — {/|) = (0, 2(1 — |yl|“^)) is depicted on 
the j/-axis. In the curves of the two plots, the bounds (a), (b) and (c) refer, 
respectively, to 140), pT) and (42). 


The bounds in (|40li-(|4^ are illustrated for |^| = 4,16 in 
FigureFor H[P) =0, |P — P| = 2(1 — |-4|“^) is shown 
for reference in Figure as the cardinality of the alphabet 
















increases, the gap between \P — U\ and its upper bound is 
reduced (and this gap decays asymptotically to zero). 

Results on the more general problem of finding bounds 
on \H{P) — H{Q)\ based on \P — Q\ can be found in 13] 
Theorem 17.3.3], Oil, Qb), QS), ESj Section 1.7] and lEll. 


4. Extension of Theorem[T]to Renyi Divergences 

Definition 1. The Renyi divergence of order a G [0, oo] from 
P to Q is defined for a G (0,1) U (1, oo) as 

Dc.{P\\Q) = ^^ log(5]P“(a)Q'-“(a)J . (43) 

\a^A ) 


where, for a G [0,oo], 

fi(a,/3i,S) 

log (l + a e [0,1) U (l,oo) 

4 <! a = l, 

log^, a = oo 


( 45 ) 


for a G [0, 2] 

/*2 (cT; /3l , 15 min; f^) 


= min<^ fi(a,/3i,S), log 1 + 


252 

Qmi 


(46) 


Recall that Pi(P||Q) = D{P\\Q) is defined to be the 
analytic extension of Da{P\\Q) at a = 1 (if P(P||Q) < 00 , 
L’Hopital’s rule gives that D{P\\Q) = liiiiQ,^i Dc,{P\\Q)). 
The extreme cases of a = 0 ,00 are defined as follows: 

• If a = 0 then Dq{P\\Q) = — log(5(Support(P)), 

• If a = +00 then 


D^{P\\Q) = log ("sup . 

Pinsker’s inequality was extended by Gilardoni JTOl for 
a Renyi divergence of order a G (0,1] (see also 0 Theo- 
rem 30]), and it gets the form 


and, for a € [0,1), fs and are given by 
fs (^5 -fmin: Pi ? 


a 

1 — a 


2^2 

log ( 1 + -— ) - 2 /?i 52 logg 


Pn 


/4(^2,Qmin,(5) 

= min bog ( 1 + 


2S^ 


log 1 + 


Qm 
min{i5, 25 ^} 

Qmin 


- 2/32(5^ loge, 


Proof: See 11201 Section 7.C]. 


(47) 


(48) 


f |P-Q|2 loge<P<,(P||Q). 

A tight lower bound on the Renyi divergence of order a > 0 
as a function of the total variation distance is given in ifTOll . 
which is consistent with Vajda’s tight lower bound for /- 
divergences in li23] Theorem 3]. 

Motivated by these findings, we extend the upper bound 
on the relative entropy in Theorem to Renyi divergences 
of an arbitrary order. 


Remark 4. A simple bound, albeit looser than the one in 
Theorem |3] is 

i2„(P||Q)<logfl + ^^) (49) 

which is asymptotically tight as a —> oo in the case of a 
binary alphabet with equiprobable Q. 

Example 1. Figure [^ illustrates the bound in ( |43] l, which 
is valid for all a G [0, oo] (see ll20l Theorem 23]), and the 
upper bounds of Theorem]^ in the case of binary alphabets. 


Theorem 3. Assume that P, Q are strictly positive with 
minimum masses denoted by Pmin and Qmin; respectively. 
Let Pi and P 2 be given in (j^ and respectively, and 
abbreviate 6 = ^\P — Q\ G [0,1]. Then, the Renyi divergence 
of order a G [0,oo] satisfies 


Dc.{P\\Q) 





' fl, 


a G 

(2,00] 


/2; 


a € 

[1,2] 

< < 




(il) 


min 

{/2, /S; fi} 1 

a G 


min 

< 

{2 log (3^) ,/ 2 ,/ 3 ,/ 4 } , 

a G 

[0,^] 


(44) 


5. Summary 

We derive in this paper some “reverse Pinsker inequalities” 
for probability measures P <C Q defined on a common 
finite set, which provide lower bounds on the total variation 
distance P—Q as a function of the relative entropy D{P\\Q) 
under the assumption of a bounded relative information or 
Qmin > 0. More general results for an arbitrary alphabet are 
available in li20l Section 5]. 

In EOl . we study bounds among various /-divergences, 
dealing with arbitrary alphabets and deriving bounds on the 
ratios of various distance measures. New expressions of 
the Renyi divergence in terms of the relative information 
spectrum are derived, leading to upper and lower bounds on 
the Renyi divergence in terms of the variational distance. 



















Fig. 2. The Renyi divergence Dq,(P||Q) for P and Q which are defined 
on a binary alphabet with P{0) = QiX) = 0.65, compared to (a) its upper 
bound in ig. and (b) its upper bound in (see (20 Theorem 23]). The 
two bounds coincide here when a S (1,1.291) U (2, oo). 
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